Goto

Collaborating Authors

 data analysis and machine learning


Applications of Entropy in Data Analysis and Machine Learning: A Review

arXiv.org Machine Learning

Since its origin in the thermodynamics of the 19th century, the concept of entropy has also permeated other fields of physics and mathematics, such as Classical and Quantum Statistical Mechanics, Information Theory, Probability Theory, Ergodic Theory and the Theory of Dynamical Systems. Specifically, we are referring to the classical entropies: the Boltzmann-Gibbs, von Neumann, Shannon, Kolmogorov-Sinai and topological entropies. In addition to their common name, which is historically justified (as we briefly describe in this review), other commonality of the classical entropies is the important role that they have played and are still playing in the theory and applications of their respective fields and beyond. Therefore, it is not surprising that, in the course of time, many other instances of the overarching concept of entropy have been proposed, most of them tailored to specific purposes. Following the current usage, we will refer to all of them, whether classical or new, simply as entropies. Precisely, the subject of this review is their applications in data analysis and machine learning. The reason for these particular applications is that entropies are very well suited to characterize probability mass distributions, typically generated by finite-state processes or symbolized signals. Therefore, we will focus on entropies defined as positive functionals on probability mass distributions and provide an axiomatic characterization that goes back to Shannon and Khinchin. Given the plethora of entropies in the literature, we have selected a representative group, including the classical ones. The applications summarized in this review finely illustrate the power and versatility of entropy in data analysis and machine learning.


Top 10 Python Libraries that every Data Scientist should know

#artificialintelligence

Python has become the most popular language for Data Scientists. This is because it is easy to learn for beginners and has many libraries that allow for robust programming and It has a wide range of libraries that can be used for Web development, Scientific computing, Data Analysis, Artificial Intelligence, and more. In this blog, we will discuss the top 10 Python Libraries that every Data Scientist should learn. Here are the top 10 Python libraries that every Data Scientist should learn. NumPy is a library for Scientific computing in Python. It is used for working with arrays and matrices.


Feature Leakage, and identifying it with Exploratory data analysis and Machine Learning

#artificialintelligence

On one of my projects, my team and I were tasked with building a mortgage leads generation model for a client -- a quite standard project in the banking industry. The data shared with us, on the other hand, were not safe or fit for modelling straightaway: the data had been compiled from different sources, "possibly from different time periods" too. This might seem like an exceptional situation. In reality, however, it is all too common to acquire data from the client and take for granted their fitness for modelling. In our situation, the comment from our client meant that we were potentially looking at feature leakage.


Using Data Analysis and Machine Learning to Identify Violence Zones in Somalia

#artificialintelligence

The conflicts in Somalia have reached alarming levels, year after year many people are victimized by disputes of territory and dominance of spaces. The problem has reached intolerable levels in the international community. This report aims to inform intervention actions through insights that are placed as strategic tools for facing the presented problems. The work is part of Omdena's AI challenge in partnership with the UNHCR -- The UN Refugee Agency. The data is derived from a wide variety of local, regional and national sources and the information is collected by trained data experts around the world.


Machine learning job: NLP Researcher at Casafari (Lisbon, Portugal)

#artificialintelligence

AI/ML Job: NLP Researcher NLP Researcher at Casafari Lisbon, Portugal (Posted Aug 9 2018) About the company Casafari tracks the entire real estate market by aggregating and matching propeties from over 2000 different sources. We provide both investors and real estate professionals with clean, hyper-local data in real time. Our clients identify the best investment opportunities. Agents close deals 10 times faster. Our clients see our meta search as Trivago for real estate, and the largest, cleanest artificial MLS in Europe.


Learning Path: R: Data Analysis and Machine Learning with R

@machinelearnbot

Tim Hoolihan currently works at DialogTech, a marketing analytics company focused on conversations. He is the senior director of data science there. Prior to that, he was CTO at Level Seven, a regional consulting company in the US Midwest. He is the organizer of the Cleveland R User Group.In his job, he uses deep neural networks to help automate of lot of conversation classification problems. In addition, he works on some side-projects researching other areas of artificial intelligence and machine learning.


Python Data Analysis and Machine Learning - Alexandre Gravier

#artificialintelligence

Our brains are good at letting us navigate the physical world and interact with each others because their specialized mechanisms and structure are the result of selective competition. This structure makes that the brain, at birth, is not an empty box ready to be filled with knowledge pouring from our senses. Instead, the brain is more like the rough outline of a fully functional mind. The silhouette it there, the details just need to be carved out. This analogy is quite good, as a lot of the early learning and brain development consists in pruning unused neural connections.